standardization and normalization.

Normalization rescales data to have a mean of and a standard deviation of unit variance . Standardization is especially...

 standardization and normalization.

Avinash
February 06, 2024

Standardization And Normalization:-πŸ”—

StandardizationπŸ”—

  • What it is: Standardization rescales data to have a mean (ΞΌ) of 0 and a standard deviation (Οƒ) of 1 (unit variance).
  • How it's done: The process involves subtracting the mean of the data and then dividing by the standard deviation.
  • Formula: For a data point x, the standardized value z is calculated as z=xβˆ’ΞΌΟƒz = \frac{x - \mu}{\sigma}.
  • Purpose: It's used to align the features on the same scale. This is especially important for models that are sensitive to the scale of input data, like SVMs, k-nearest neighbors, and neural networks.

Example of StandardizationπŸ”—

Imagine a dataset for house pricing, with features like size (in square feet) and number of bedrooms. These features are on different scales. If you standardize these features, each will contribute equally to the distance calculations in a model, like k-NN.

NormalizationπŸ”—

  • What it is: Normalization (or Min-Max Scaling) rescales the data to a fixed range, typically 0 to 1.
  • How it's done: The process involves subtracting the minimum value of the feature and then dividing by the range (maximum value - minimum value) of the feature.
  • Formula: For a data point x, the normalized value x' is calculated as xβ€²=xβˆ’min(x)max(x)βˆ’min(x)x' = \frac{x - \text{min}(x)}{\text{max}(x) - \text{min}(x)}.
  • Purpose: It's used when features are bounded and need to be on a common scale. It’s often used in image processing where pixel intensities have to be normalized to fit within a certain range (0-255).

Example of NormalizationπŸ”—

Consider a neural network processing images, where each pixel intensity ranges from 0 to 255. Normalizing these intensities to a range of 0 to 1 can make the network train more efficiently.

Key DifferencesπŸ”—

  • Scale: Standardization does not bound values to a specific range, which may be necessary for algorithms that require input data to be bounded, like neural networks. Normalization bounds the data within a specified range (e.g., between 0 and 1).
  • Outliers: Standardization is less affected by outliers than normalization. Since normalization scales data based on the minimum and maximum values, outliers can skew the range significantly.

Notes:πŸ”—

  • Standardization is particularly useful when the features in your dataset have different units of measurement or vastly different scales. It's commonly used in algorithms that are sensitive to the variance in data, like Support Vector Machines (SVMs) and k-Nearest Neighbors (k-NN).

  • Normalization is essential when your data needs to be on the same scale, such as in the case of neural networks, which often require input data to be normalized. It’s also useful for image data processing where pixel intensities need to be normalized.

Standardization And Normalization


Property Standardization Normalization
Definition Rescales data to have a mean of 0 and a standard deviation of 1. Rescales data to a fixed range, typically 0 to 1.
Formula z=xβˆ’ΞΌΟƒz = \frac{x - \mu}{\sigma} xβ€²=xβˆ’min(x)max(x)βˆ’min(x)x' = \frac{x - \text{min}(x)}{\text{max}(x) - \text{min}(x)}
Scale No fixed range; values centered around 0. Fixed range (e.g., 0 to 1).
Sensitivity to Outliers Less sensitive to outliers. More sensitive to outliers.
Typical Use Cases Models sensitive to the scale of input data (e.g., SVM, k-NN, Neural Networks). Data bounded within a range, like pixel intensities in image processing.
Where to Use In algorithms that assume data is centered and standardized, especially when features have different units/scales. When data needs to be normalized to a specific scale, such as in neural network algorithms requiring input normalization.
Where Not to Use May not be necessary for tree-based algorithms that are invariant to the scale of features. Not ideal when the original distribution of the data is important or when outliers are critical to the analysis.

COMING SOON ! ! !

Till Then, you can Subscribe to Us.

Get the latest updates, exclusive content and special offers delivered directly to your mailbox. Subscribe now!

ClassFlame – Where Learning Meets Conversation! offers conversational-style books in Computer Science, Mathematics, AI, and ML, making complex subjects accessible and engaging through interactive learning and expertly curated content.


Β© 2024 ClassFlame. All rights reserved.